An empirical study toward dealing with noise and class imbalance issues in software defect prediction

نویسندگان

چکیده

The quality of the defect datasets is a critical issue in domain software prediction (SDP). These are obtained through mining repositories. Recent studies claim over dataset. It because inconsistency between bug/clean fix keyword fault reports and corresponding link change management logs. Class Imbalance (CI) problem also big challenging SDP models. method trained using noisy imbalanced data leads to inconsistent unsatisfactory results. Combined analysis instances CI needs be required. To best our knowledge, there insufficient that have been done such aspects. In this paper, we deal with impact noise on five baseline models; manually added various level (0–80%) identified its performance those Moreover, further provide guidelines for possible range tolerable We suggested model, which has highest ability outperforms other classical methods. True Positive Rate (TPR) False (FPR) values models reduce 20–30% after adding 10–40% instances. Similarly, ROC (Receiver Operating Characteristics) 40–50%. model avoid 40–60% as compared traditional

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dealing with Class Imbalance using Thresholding

We propose thresholding as an approach to deal with class imbalance. We define the concept of thresholding as a process of determining a decision boundary in the presence of a tunable parameter. The threshold is the maximum value of this tunable parameter where the conditions of a certain decision are satisfied. We show that thresholding is applicable not only for linear classifiers but also fo...

متن کامل

An empirical study on software defect prediction with a simplified metric set

Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between withinand cross-project defect prediction when available historical data are insufficient remain unclear. Ob...

متن کامل

Using Class Imbalance Learning for Cross-Company Defect Prediction

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...

متن کامل

Dealing with Multiple Classes in Online Class Imbalance Learning

Online class imbalance learning deals with data streams having very skewed class distributions in a timely fashion. Although a few methods have been proposed to handle such problems, most of them focus on two-class cases. Multi-class imbalance imposes additional challenges in learning. This paper studies the combined challenges posed by multiclass imbalance and online learning, and aims at a mo...

متن کامل

Dealing with software design issues using an Agent-Oriented methodology

"!# $% & !# ' %() !#* + ' ' , .-" # /'0 1 2 ' % 3 * 4! 1" 4 5 *6 7 !8 9 ' # : # % .-" #!# 5 "19 1" # " ; &9< ' #*6 ; 4 = # " > ' " 7 4!? # " ' # ! @ + A0 1" !2( 2 CB+D=E" 4 = # " F 7 4 " ' G 9 ' # . H ' IGJ &"1 F K 4 L M ' '19 N /2 ' ( 19 ' , O . 4* '!L 4 " "!# # J 5 J 'G9 J 4! !# O '* !#1" 4 # " P " LQ # + ' 4 7 4 # " R 9 #S? / ' + A # F #&"1 ' T FB9 F ' U ' 7-" !# 'GV 1 /4 V H (W () H ' 8 4 L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Soft Computing

سال: 2021

ISSN: ['1433-7479', '1432-7643']

DOI: https://doi.org/10.1007/s00500-021-06096-3